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Abstract 

In 1977, Baillon and Haddad proved that if the gradient of a convex and continuously differen- 
tiable function is nonexpansive, then it is actuaUy firmly nonexpansive. This result, which has 
become known as the Baillon-Haddad theorem, has found many applications in optimization 
and numerical functional analysis. In this note, we propose short alternative proofs of this result 
and strengthen its conclusion. 
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1 Introduction 



Throughout, ?^ is a real Hilbert space with scalar product (• | •) and induced norm || • ||. Let C be 
a nonempty subset of 7^, let T: C — > 7^, and let /? G ]0,+cx3[. Then T is l//3-cocoercive if (this 
property is also known as the Dunn property or inverse strong monotonicity) 

(Vx G C7)(Vy G C7) j3 {x - y \Tx - Tx) > \\Tx - TyW'^ , (1) 

and T is /3-Lipschitz continuous if 

(Vx G C7)(V?/ G C) \\Tx-Tx\\<(i\\x-y\\'^. (2) 

When /? = 1, ([T]) means that T is firmly nonexpansive and ([2D that T is nonexpansive. Cocoercivity 
arises in various areas of optimization and nonlinear analysis, e.g., [11 O O O [T^ [T5l [201 123j . It 
follows from the Cauchy-Schwarz inequality that l//3-cocoercivity implies /3-Lipschitz continuity. 
However, the converse fails; take for instance T = — Id, which is nonexpansive but not firmly 
nonexpansive. In 1977, Baillon and Haddad showed that, ifC = 7i and T is the gradient of a convex 
function, then ([T|) and ^ coincide. This remarkable result, which has important applications in 
optimization (see for instance [Tj [21] ) , has become known as the Baillon-Haddad theorem. 

Theorem 1.1 (Baillon-Haddad) [3l Corollaire 10] Let f : H —>■ be convex, Frechet differen- 
tiable on TL, and such that V/ is f3-Lipschitz continuous for some (3 G ]0,+oo[. Then V/ is 
1/ P-cocoercive. 

In [3] , Theorem 11.11 was derived from a more general result concerning n-cyclically monotone 
operators in normed vector spaces. Since then, direct proofs have been proposed, such as |11|, 
Lemma 6.7], [12^ Theorem X.4.2.2], and [I8l Proposition 12.60] for Euclidean spaces. These ap- 
proaches rely on convex analytical and integration arguments. An infinite dimensional proof can be 
found in [221 Remark 3.5.2], as a corollary to results on the properties of uniformly smooth convex 
functions. 

The goal of our paper is to provide new insights into the Baillon-Haddad theorem. In Section [2l 
we propose a short new proof of Theorem 11.11 and present additional equivalent conditions, thus 
making a connection with lesser known parts of Moreau's classical paper [16j. In Section [3l we 
provide a second order variant of the Baillon-Haddad theorem that partially extends work by Dunn 

m- 

Notation and background. Our notation is standard: TqIH) is the class of proper lower 
semicontinuous convex functions from Ti to ]— oo,+oo] and □ denotes infimal convolution. The 
conjugate of a function f : TC ^ ]— oo,+oo] is denoted by /*, and its subdifferential by df. For 
background on convex analysis, we refer the reader to [12\ [T71 122j . 

2 An enhanced Baillon-Haddad theorem 

Let us start with some standard facts on Moreau envelopes and proximity operators; we refer the 
reader to Moreau's original paper [16] and to [Il[71[l8] for details and complements. Let 93 G ro(7^) 
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and let 7 G ]0, +oo[. The Moreau envelope of 9? of index 7 is the finite continuous convex function 

euY^iip) =ipa {q/-t), where Q = ^\\ ■ f- (3) 

Moreau's decomposition asserts that 

envi/^((/7) + env^(99*) o (7 Id) = jq. (4) 

The proximity operator (or proximal mapping) of / is the operator ProXi^ = (Id -\-dip)~^ ; it maps 
each x G 7^ to the unique minimizer of the function y 1— >■ ip{y) + q{x — y). The Moreau envelope 
envi((/5) is Frechet differentiable with gradient Venvi((^) = Prox,^*. Hence, ([4]) yields 

V envi/^((/?) = Prox^^. o (7 Id) = 7(Id - Prox^/^). (5) 

Moreover, 

Prox^ : Tl ^ 7i is firmly nonexpansive. (6) 

We are now ready to present the main result of this section, which strengthens the conclusion of 
Theorem II. II by providing four additional equivalent conditions and a short new proof. 

Theorem 2.1 Let f G To(Ti.), let (3 G ]0,+cx)[, and set h = f* — q/p. Then the following are 
equivalent. 

(i) / is Frechet differentiable on Ti and V/ is (3-Lipschitz continuous. 

(ii) /3q — f is convex. 

(iii) /* — q/(3 is convex (i.e., f* is 1/ (3 -.strongly convex). 

(iv) h G TQ{7i) and f = envi/^(/i*) = f3q — env^(/i) o /?Id. 

(v) h G Toin) and V/ = Prox^/,o/3Id = /5(Id - Prox;,*/^). 

(vi) / is Frechet differentiable on 7i and V/ is 1/ (3-cocoercive. 



Proof [(i)H(ii)| By Cauchy-Schwarz, (Vx G H)(Vy G H) {x - y\[3x -V f{x) - (3y + \/ f{y)) 
(5\\x-yf^ {x-y\ V/(x) - V/(y)) > ||x-2/||(/3||x-y||-||V/(x)-V/(y)||) > 0. Hence, V(/3(?-/) 
/3Id— V/ is monotone and it follows that (3q — f is convex (see, e.g., \12\ Theorem 2.1.11]). 



iii) Set g = Pq — f ■ Then g G ro(7^) and therefore g = g** . Accordingly, 



f = Pq-g = f3q-g** = Pq-snp{{-\u)-g*{u)) = inf (/?g - {■ \ u) + g* (u)) . (7) 



Hence 



r = sup {Pq-{.\u)+ g*{u)y = sup ((/3g - (• | ^) )* - g*{u)) 



sup [q{- + u)/l3 - g*{u)) = q/(5 + sup (( {■\u)+ q{u))//3 - g*{u)) , (8) 
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where the last term is convex as a supremum of affine functions. Thus, h is convex. 



iii)=>(iv) Since / G To{H) and h is convex, we have h G ro(7^), h* € ro{H), and / = /** 



{h + q/P)* = h* D Pq = envi/^(/i*) = (3q — env/3(/i) o /3Id, where the last identity follows from ([4]) 



(iv)^(v): Use dSD. 



VI 



By ([6]), Prox^/j is firmly nonexpansive. Hence, it follows from ([T]) that V/ 



Prox^/i o /? Id is l//3-cocoercive. 



VI 



(i) : Apply the Cauchy-Schwarz inequality. 



Remark 2.2 Some comments regarding Theorem 12.11 are in order. 



(a) The proof of the implication (i)=>(vi), i.e., of the Baillon-Haddad theorem (Theorem ll.ip 



appears to be new and shorter than those found in the literature. In addition. Theorem 12.11 
brings to light various characterizations of the Lipschitz continuity of the gradient of a convex 
function. The equivalences (ii) " (Iil)|^(iv) are due to Moreau, who established them (for 
/3 = 1) in [16l Proposition 9.b] (see also ^13^ Corollary 3]). On the other hand, the equivalences 
[(I)l ^(iii)" ^ (iv)[^(vi) are shown in Euclidean spaces in [THl Proposition 12.60] with different 



techniques. 

(b) Set (3 = 1. The conclusion of Theorem 11.11 is that V/: 7i ^ Ti \s firmly nonexpansive. 
Hence, since the class of firmly nonexpansive operators with domain Ti coincides with that of 
resolvents of maximal monotone operators [10', Section 1.11], we have V/ = (Id+A)~^, for 
some maximal monotone operator A: Ti — s- 2^. However, 
be the proximity operator of /i, i.e., A = dh = df* — Id. 



more precisely reveals V/ to 



(c) Let fi € ro(?^), let /2: 7^ — > M be convex and differentiable with a Lipschitz continuous 
gradient, and consider the problem of minimizing /1 + /2. Without loss of generality (rescale), 
we assume that the Lipschitz constant of V/2 is /3 = 1. A standard algorithm for solving this 
problem is the forward-backward algorithm [TJ [20] 



xq GTC and (Vn E 



Xn+l 



Prox 



Infl 



7nV/2(x„)), 0<7„<2. 



(9) 



Now set h2 = f2 — q- Then it follows from the implication (i) ^ (v) that V/2 = Id — Prox/j*. 



Hence, we can rewrite 



as 



xq eTC and (Vn € N) Xn+i = Prox^„/^ ((1 - 7„)xn + 7n Proxft,. x„) , < 7„ < 2. (10) 

This shows that the forward-backward algorithm ([9]) is actually a backward-backward al- 
gorithm. In particular, for 7^ = 1, we recover the basic backward-backward iteration 
Xn+i = Prox/j Prox/j. Xn- 



We conclude this section with an alternative formulation of the Baillon-Haddad theorem that 
brings into play Bregman distances. Recall that if G TqITI) is Gateaux differentiable on 
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intdoiiK^ 7^ 0, the associated Bregman distance is defined by 

I "fix) - ^iy) - {x -y\ V(^(y)) , if y G int dom ip; 



D^-.HxH^ [0,+oo] : {x,y) 



+00, 



otlierwise. 



(11) 



Corollary 2.3 Let (3 G ]0,+oo[, and let f : Ti ^ he convex, Frechet differentiahle on 7i, and 
such that f* is Gateaux differentiahle on int dom /* 7^ 0. Then the following are equivalent. 

(i) V/ is P-Lipschitz continuous. 

(ii) (Vx G n)(yy G n) Df{x,y) < Pq{x - y). 

(iii) (Vx* G G n) (3Df,{x*,y*) > q{x*-y*). 



Proof, (i) ^ (ii) Set g = (5q — f . Then g is Frechet differentiahle on dom g = 71 and Vg = /? Id —V/. 



Hence, it follows from the equivalence Ki)M(n)l in Theorem 12.11 and (fTTI) that (i) (7 G ro(7^) is 



Frechet differentiahle on int dom / = 7^ <^ (Vx G H){\ly G Ti) Dg{x,y) > <^ (Vx G W)(V7/ G 7^) 
Df{x,y) < f3q{x - y). 



(iii): Set h = f* — q/(5. Then h is Gateaux differentiahle on int dom h = int dom /*, 



with V/i = V/* — (l//3)Id. Hence, in view of the equivalence |(i)m(iii) in Theorem 12.11 and (jlip . 



(i) 44> /i G ro(7^) is Gateaux differentiahle on int dom h = int dom /* (Vx* G 7i){^y* G H) 
Dh{x*,y*)>0 ^ (Vx* G n){yy* G H) Df,{x*,y*) > q{x* - y*)/(3. ■ 



3 A second order Baillon-Haddad theorem 



Under the more restrictive assumption that the underlying convex function is twice continuously 
differentiahle, we shall obtain in Theorem 13.31 a very short and transparent proof inspired by the 
work of Dunn We require two preliminary propositions. 

Proposition 3.1 Let C he a nonempty open convex suhset ofH, let B he a real Banach space, and 
let G: C ^ B he continuously Frechet differentiahle on G . Then G is nonexpansive if and only if 
(Vx G C) ||VG(x)|| < 1. 

Proof. Let x G C and let y £ 7i. Suppose that G is nonexpansive. For every t G ]0,+oo[ 
sufficiently small, x + ty £ C and hence \\G{x + ty) — G{x)\\/t < \\y\\. Letting t | 0, we deduce that 
||(VG(x))y|[ < ||y||. Since y was chosen arbitrarily, we conclude that ||VG(x)|| < 1. Conversely, if 
y G C, we derive from the mean value theorem (see, e.g., O Theorem 5.1.12]) that |[G(y) — G(x)|| < 
\\y - x\\ sup^g[^._j^] l|VG(2:)|| < \\y - x\\. m 

Let A: TC ^ Ti and B: Ti. ^ Ti he self-adjoint bounded linear operators. Then A is positive, 
written ^ ^ 0, if (Vx £ H) (x | Ax) > 0. We write A^BiiA-B^O. The following result is 
part of the folklore. 
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Proposition 3.2 Let A: H ^ TC be a bounded self-adjoint linear operator. Then \\A\\ < 1 if and 
only ifldhAy - Id. 



Proof Assume that H ^ {0} and set S = {x € H \ \\x\\ = l}. Then Id h A ^ {\/x £ H) {x\x)> 
{x I Ax) 44> (Vx € S) 1 = {x \ x) > {x \ Ax). Similarly, A'^ — Id <^ (Vx G S) {x \ Ax) > —1. Hence 
Idh Ay -Id (ix £ S) \{x I Ax)\ < 1 ll^ll = sup^g5 |(x | Ax)\ < 1. ■ 

The main result of this section is a Baillon-Haddad theorem for twice continuously Frechet 
differentiable convex functions. It extends |9i Theorem 4], which assumed in addition that / has 
full domain and uniformly bounded Hessians. 

Theorem 3.3 Let C be a nonempty open convex subset ofTi, let f : C ^ M. be convex and twice 
continuously Frechet differentiable on C, and let j3 G ]0, +oo[. Then V/ is fi-Lipschitz continuous 
if and only if it is 1/ (3-cocoercive. 



Proof Define two operators on C by G = (1//3)V/ and by i7 = VG = (l//3)V2/• 
assumptions, the convexity of / is characterized by ^22i Theorem 2.1.11] 



(Vx G n) H{x) y 0. 



Hence, 



V/ is /3-Lipschitz continuous 44> G is nonexpansive 

4^(VxGG) \\H{x)\\<1 

^ (Vx G G) - Id ^ H{x) ^ Id 

4^ (Vx G G) r< H{x) ^ Id 

^ (Vx G G) - Id ^ 2H{x) - Id ^ Id 

4^(VxgG) ||2/7(x) -Id|| < 1 

44> 2G — Id is nonexpansive 

<;=^ G is firmly nonexpansive 

44> V/ is l//?-cocoercive, 

and we obtain the conclusion. 



Under our 



(12) 



(by Proposition 13.1 
(by Proposition 13.2 

(by m) 



(by Proposition 13. 2p 
(by Proposition 13. ip 
(by [ini Lemma 1.11.1]) 
(by O) 



In linear functional analysis, the following property is usually obtained via spectral theory. 

Corollary 3.4 Let A: TL ^ Ti be a positive self-adjoint bounded linear operator. Then (Vx G Ti) 
\\A\\ (x I Ax) > ||vlx|p. 



Proof. This is an application of Theorem 13.31 with /: 7^ ^ R: x i-^ (x | Ax) /2. Indeed, / is twice 
continuously Frechet differentiable on Ti. with V/ = A, which is ||yl||-Lipschitz continuous. ■ 

Remark 3.5 It would be interesting to see whether Theorem 13.31 holds true when the second-order 
assumption is replaced by Frechet differentiability. However, the natural approach by approxima- 
tion does not appear to be applicable; see [H Section 5] for pertinent comments. 
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